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Abstract 

When categorical responses are simulated from a Multidimensional Many-FACETS Rasch 
Compensatory Model (MMFRCM), the effects of ability, task difficulty, and step difficulty 
estimates with unidimensional Many-FACETS Rasch Model (MFRM, Linacre, 1989) were 
examined in terms of three error indexes, average absolute difference (AAD), bias, and root 
mean square error (RMSE). The results show that violating unidimensional assumptions do have 
an effect on parameters estimation. However, the degree to which parameters under which 
condition that estimation shows robustness or not varies dramatically. The conclusion is that 
complex nature of the model and data must be clearly understood to determine under which 
conditions the model should be applied and how well the parameters associated with model can 
be reliably estimated. This study provides strong evidences which indicates the nature of MP^RM 
performance when model assumption is violated. 
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The Effects of Multidimensional Polytomous Response Data on 
Unidimensional Many-FACET Rasch Model Parameter Estimates 

Perspective 

Essay questions and performance tasks are becoming more important and commonplace in 
large-scale assessments, such as Stanford Achievement Test from Harcourt Educational 
Measurement), Scholastic Aptitude Test (SAT) from ETS and the TerraNova from CTB- 
McGraw Hill. However, essay questions and performance tasks are not without their drawbacks 
because of the expense, time requirements, and issues of subjectivity associated with scoring. 
Both human rater and automated-scoring methods in large-scale, high-stakes standardized 
assessments could cause concerns over validity and fairness of scoring because raters’ judgments 
are treated as the only criteria of essay or performance quality (Bennett & Bejar, 1998; Linacre, 
1989; Keith, 1998; Mzumara, Shermis, & Fogel, 1998; Powers, Burstein, Chodorow, Fowles, & 
Kukich, 2000, 2001). 

One of solutions to prescribe ordinal rating observations being ordered qualitatively on 
latent trait of interest is to use Many-FACETS Rasch Model (MFRM, Linacre, 1989). MFRM is 
an extension of the partial credit model (Master, 1982) and is a powerful tool to construct linear, 
objective measures with known precision and quality. MFRM extends the possibility of 
objective measurement to examinations which include subjective judgments. MFRM also yields 
greater freedom from judge bias and greater generalizability of the resulting examinee measures 
than has previously been available (Linacre, 1989). MFRM has been used to conduct analysis on 
rater behavior, pattern of rating in varied performance assessment situations, and job analysis 
(Engelhard, 1992, 1994, 1996; Engelhard, Myford, & Cline, 2000; Linacre, Englhard, Tatum, & 
Myford, 1994; Lumley & McNamara, 1995; Lunz & Stahl, 1990; Myford & Cline, 2002; Wang, 
2002). One of the fundamental assumptions about MFRM and many other IRT models is that 
the variable to be measured is unidimensional. In practice, this assumption of unidimensionality 



has been violated in most testing situations, and testing professionals now agree that tests are 
seldom unidimensional (Ackerman, 1992, 1996; Hambleton & Swaminathan, 1985; Reckase, 
1979, 1985, 1997; Stout, 1987; Traub, 1983; Yen, 1984, 1985). Using a unidimensional IRT 
model for multidimensional test data might cause lack of fit of the data to the model; jeopardize 
"sample-free", "test-free", and "judge-free" properties of the model; and lead to incorrect 
conclusions about the nature of the data being investigated (Ackerman, 1994; Li & Robert, 2000; 
Linacre, 1989; Reckase, 1985). Although there were extensive studies on applying 
unidimensional IRT models to multidirnensional tests on other IRT models (Ackerman, 1989; 
Ansley & Forsyth, 1985; De Ayala, 1994; Drasgow & Parsons, 1983; Folk & Green, 1989; 
Harrison, 1986; Luecht & Miller, 1992; Oshima & Miller, 1990; Reckase, 1979, 1987; Way, 
Ansley, & Forsyth, 1986; Kirisci, Hsu, & Yu, 2000), no attempt has been made to directly assess 
the robustness of violating assumption of unidimensionality on MFRM with polytomous 
response data. Given the fact that the MFRM was widely used in many situations to address 
important issues in many fields, the consequence of violation unidimensionality using MFRM 
should not be continually neglected. 

The purpose of this empirical study is to examine the consequences of ability, task difficulty, 
and step difficulty estimates with the unidimensional MFRM when categorical responses are 
simulated from a Multidimensional Many-FACETS Rasch Compensatory Model (MMFRCM) 
and to attempt to provide some understanding of the nature of the ability, task difficulty, and step 
difficulty estimations under violation of unidimensionality. 

The Multidimensional Many-FACETS Rasch Compensatory Model 
First, the Multidimensional Many-FACETS Rasch Compensatory Model (MMFRCM) was 
developed. The MMFRCM is a multidimensional extension of the MFRM (Linacre, 1989). As 
the distinction was made between compensatory and noncompensatory for the three-parameter 
logistic model (Ansley & Forsyth, 1985; Hattie, 1981; Simpson, 1978), for all examinees 



dimensions, the MMFRCM specifies a single task difficulty parameter for each task, a single 
rater (called “scale” in job analysis) severity/leniency for each rater, and the same set of step 
difficulties for rating categories (rating category holds across task but differs among rater/scale). 
The exponential form of the MMFRCM is 



P(9.iM) = T -„k = 0,l,...,K 



iexpii[e.,-5i-:k,-T,J 



( 1 ) 



h=lx=0 



Where 

P is the probability of examinee n for dimension h on task i being rated by rater j, a rating 
of category k, 

0nh is the ability parameter for examinee n for dimension h (n from 1 to N; h from 1 to r), 

6i is the difficulty parameter for task i (i from 1 to I), 

X.j is the severity parameter for rater j (j from 1 to J), 

tjx is the step difficulty parameter on rating scale of k categories and for this study, rating 
category holds across task but differs among rater/scale (x from 0 to K). 

Method and Data 

Design 

To examine the effects of multidimensional polytomous response data on the MFRM 
parameter estimates, five factors were manipulated and two or three levels of each of the factors 
were selected. There were 4 independent variables: (1) Ability dimension (one, two, and three), 
(2) Sample size of examinee (500, 1000, 2000), (3) Degree of ability correlation (0, .3, and .7), 
(4) Task (40 and 80), (5) Rater/scale (one, two, and three). For two raters/scales, the same 5 step 
difficulties are -.2, -.05, .05, .2. For three raters/scales, first two raters/scales have same 5 step 
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difficulties; -2, -.05, .05, .2 and third rater/scale has step difficulties -1.5, 0, 1.5. Three error 
indexes, average absolute difference (AAD), bias, and root mean square error (RMSE) were used 
as dependent variables for evaluating the effect of the simulation. For the purpose of 
comparison, responses from a unidimensional MFRM were also generated in the study. 

Five replications of each of the (1 one dimension + 3 two dimension + 4 three dimension) x 3 
sample size x 3 degree of correlation x 2 number of task = 144 total combination (cells) were 
run. Based on a past research suggestion (Harwell, Stone, Hsu, & Kirisci, 1996), both 
descriptive and inferential procedures were used to summarize the simulation results. 

Simulation procedure. 

Given parameters defined by the specifications mentioned above, the steps involved in this 
simulation process are: 

Step 1, a sample of 500, 1000, and 2000 vectors of true abilities were generated from a 

multivariate normal distribution with specified intercorrectons (2D: pi 2 = 0, .3, and .7; 

3D: pi 23 = (0,0,0), (0,0, .3), (0,0, .7), and (0, .3, .7)) using Cholesky factorization 
procedure (Timm, 1997). For unidimension, same size of samples true ability were 
generated from standard normal distribution. 

Step 2, the known parameters (0, 6, X, and t) were used to calculate the probability of each 
simulated examinee for each dimension on each task rated by each rater with each a 
rating of category k using equation (1). 

Step 3, the generated probabilities from step 2 were compared to a uniform (0,1) random number 
to produce responses to specific categories. 

The different random numbers were used as seed for each of five replications. 




Results 
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The parameter estimates based on the responses from step 3 were calibrated using FACETS 
computer program (Linacre, 1996, 1998). For ability, the unidimensional estimates of ability 
were correlated with both the individual and average true ability parameters, SE and RMSE were 
calculated. For task, the unidimensional estimates of task difficulties were correlated with both 
the individual and average true ability parameters, SE and RMSE were calculated. 

Ability Estimation 

Tables 1 to 5 show the means and standard deviations (SD) of AAD, bias, and RMSE of 
ability estimations for unidimension, two dimension, and three dimension conditions. These 
results suggest that, in general, as dimension increases and number of tasks decrease, the AAD, 
bias, and RMSE of ability estimations increase. The AAD, bias, and RMSE of ability estimation 
between individual true ability and estimate are larger than those of ability estimations between 
average true ability and estimates. 

Average (over replication and number rater/scale) correlations between estimated ability 
0 and first true 0i, second true 02 , and average true 0avg abilities for two dimensional data are 
presented in Table 6. For the unidimensional data set, the correlation between true and estimated 
ability is higher than that of two dimensional data. As correlation p(0i, 02) between true abilities 

/S. 

increased, the correlation of r§ei increased too, but this is not necessarily true for re 02 - The 0 
values were highly related to the averages of the true 0s only when the values of p(0i, 02 ) were 0 
and 0.3. 

Table 7 shows the results of the three-way ANOVA of AAD, bias, and RMSE (averaged 
across replication) for unidimensional data set. The three factors of number rater (NR or scale), 
sample size (SS), and number task (NT) have different effects on AAD, bias, and RMSE of 
ability estimations. None of two two-factor interactions nor the one three-factor interaction 
effect are statistically significant. The main effects of NR on AAD and bias are statistically 
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significant at 0.05 level. The effect of NT is statistically significant. The NR has the most 
influence on AAD and bias - it accounted for 9% of the total variance of AAD, and 25.3% of the 
total variance of bias. 

Tables 8 and 9 present the three-way ANOVA of AAD, bias, and RMSE (averaged across 

replication) for two- and three-dimensional data sets. Three factors manipulated were correlation 

between true abilities, number rater (or scale), and sample size. For two dimensional data, all 

interaction effects are not statistically significant. Although the main effect of factor of 

correlation is statistically significant for RMSE, this factor practically has no effect on ability 
♦ ♦ • ‘2 

estimation because it has low values of t) that explained percentage variance on total variance. 
For three dimensional data, some interaction effects are statistically significant but had very low 
values of r\^. The main effects of factor of correlation are statistically significant for AAD, bias, 
and RMSE, but it accounted for very low values of the total variance. 

Task and Step Estimations 

Tables 1 1 and 12 show the correlations between task estimates and true task parameters 
under different conditions. First, the number task has no effect on the means and SDs of average 
correlations. The only effect is number rater (or scale). However, this decrease is due to the 
number of steps used in factor of number raters. When numbers of one and two raters are used, 
the number steps is five, added one more rater used 3 steps instead of 5. Although the 
confounding between number rater and number step could be explained as the contribution to the 
changes in the values of correlation of task estimation, the real factor should be the number steps 
rather than the number rater because there is no correlation difference between one rater and two 
raters. 




Practical Implication 



This empirical study is the first study to systematically examine the effects of the 
unidimensional parameter estimates derived from two- and three-dimensional data when the 
Many-FACETS Rasch Model is used. It seems that violating unidimensional assumptions does 
have an effect on parameter estimation. However, the degree to which parameters under which 
condition that estimation shows robustness or not varies dramatically. For this study, among all 
factors, the number of raters had the most effect on AAD, Bias, and RMSE, and the sample size 
has least effect on AAD, bias, and RMSE. The number of step and the number of task have 
moderate effects on AAD, bias, and RMSE. Given the fact that the MFRM is widely used in 
education, psychological, health, and licensure and certification assessments, the complex nature 
of the model and data must be clearly understood to determine under which conditions the model 
should be applied and how well the parameters associated with model can be reliably estimated. 
This study provides strong evidence which indicates the nature of MFRM performance when 
model assumption is violated. 
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Means and Standard Deviations (over replication) of the AAD, Bias, and RMSE of Ability Parameter Estimations Based on Three 
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Means and Standard Deviations (over replication) of the AAD, Bias, and RMSE of Ability Parameter Estimations Based on Three 
Dimensional Data (All estimates were compared to average true ability) 



O 

ERIC 



<u 



00 

a 

H 

d 

:z; 



(U 

N 

00 

I 

cd 

00 



a. 

(U 

4— » 
00 

d 

:z; 






_ cnco«n«nocoooGNOO 
csOmcNcocNcocnTj-Tj-in^Tj-^ 





VO 


1-H 


00 




00 


CN 


ON 


r— 1 


VO 


CO 


CN 








O 


VO 


O 


CN 






CN 


00 


vq 


CN 


r-; 


CN 


vq 


CN 


vq 




q 


q 




q 


q 


q 




q 


q 


q 


’-H 


q 


q 


a\ 










'-H 


'-H 


'-H 


i-H 




CN 








CN 


CN 


CN 


CN 


CN 


CN 


CN 


1-H 


CN 




CN 



a\ 



oomONcnoocnoocNr^cNOO»o^vocN^cN»o^voom^ 



<n»nONt^t^^»n»nt^^’-Hr^r^oocncnro'^cNcncNvo»o^ 

<ncN<ncN<ncNmcNmcNmcN«ocNiocNio<N«ocN»o<N»o<N 



’^'^vo»n«n«n^oomt^cnvot^m»0’— iiommT-Hcn’-HcncN 



CN 

O 



00 



00 



cn 



00 



-< 00 00 



CN 

O 



CN 



O 

O 



»n 

o 



m m 
CN ^ CN 



00 CO 
CN ^ 



uo 



o 



uo VO 

o 



o 



OOOOOOOOOOOOOOOOOOOOOOOO 

TfOOTl'OO'^ 00 ’^ 00 ’^ 00 ’^ 00 ’^ 00 ’^ 00 '^ 00 '^ 00 '^ 00'^00 



o 

o 

»n 



o 

o 

o 



o 

o 


o 


o 

o 


o 

o 


o 


o 

o 


o 

o 


o 


o 

o 


o 

o 


o 

CN 


»n 


o 


o 

CN 


»n 


o 


o 

CN 


uo 


o 


o 

CN 



»n 


»n 


»n 


uo 


T3 


T3 


T3 


T3 


g 


a 


a 


C 




cd 


cd 


cd 


CO 


CO 


CO 


CO 



c 

q 


o 






/-~N 




-4_» 




o" 


q 


q 


q 


q 

13 


<D 


' cd 


o' 


o" 


q" 


t; 

o 

U 




o 


O 


o 


O 
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Results of ANOVA for Unidimensional Data 
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Results of ANOVA for Two Dimensional Data Based on Estimates and First True Ability 
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Results of ANOVA for Three Dimensional Data Based on Estimates and First True Ability 
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Table 10 



Means and Standard Deviations (over replication) of Average Correlations between Task 
Estimate and True Task Difficulty for Two Dimensional Data 



Dimension 



Correlation 
p( 01. 02) 



No. Rater 



No. Task 



Mean 



SD 



2 

3 

2 0 1 

2 

3 

.3 1 

2 
3 

.7 1 

2 
3 



40 


1.00 


.00 


80 


1.00 


.00 


40 


1.00 


.00 


80 


1.00 


.00 


40 
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80 
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40 


1.00 


.00 


80 


1.00 


.00 


40 


1.00 
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1.00 


.00 


40 
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.00 


80 


1.00 


.00 


40 


1.00 


.00 
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.00 


80 


1.00 


.00 


40 


.97 


.01 


80 


.97 


.01 




^3 



1 



Table 1 1 

Means and Standard Deviations (over replication) of Average Correlations between Task 
Estimate and True Task Difficulty for Three Dimensional Data 
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No. Task 


r 




Mean 


SD 


(0, 0, 0) 
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(0, 0, .3) 
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